The Mann-Whitney U test, also known as the Wilcoxon rank-sum test, is a non-parametric test used to compare differences between two independent groups when the assumption of a normally distributed data cannot be assumed. It is often used as an alternative to the independent samples t-test when data are not normally distributed.
Mann-Whitney U Test
Purpose: The Mann-Whitney U Test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed. It is the non-parametric alternative to the independent two-sample t-test.
How it Works:
The test works by ranking all the values from both groups together. The ranks are then used to calculate the U statistic (a measure of the number of times a score from one group precedes a score from the other group).
The test essentially assesses whether one group tends to have higher or lower values than the other, without assuming a specific distribution of the scores.
36.1.1 Assumptions
The Mann-Whitney U test is based on the following assumptions:
Independence of Samples: The samples from the two groups must be independent of each other.
Ordinal Data: The data do not need to be normally distributed, but should be ordinal or continuous.
Similarity of Shape: The distributions of the two groups should have the same shape, allowing for a difference in medians.
36.1.2 Hypotheses
The hypotheses for the Mann-Whitney U test are framed as follows:
Null Hypothesis (H₀): There is no difference in the medians of the two groups.
Alternative Hypothesis (H₁): There is a difference in the medians of the two groups.
36.1.3 Formula
The U statistic is calculated by first ranking all the data from both groups together. Each data point gets a rank, and the ranks for each group are summed. The U statistic is then computed using these rank sums. The formula for U is:
\[
U = n_1n_2 + \frac{n_1(n_1+1)}{2} - R_1
\]
Where:
(n_1) and (n_2) are the sample sizes of the two groups.
(R_1) is the sum of the ranks in the first group.
36.1.4 Calculation Steps
Combine all observations from both groups into a single dataset.
Rank all observations from the lowest to the highest, handling ties by assigning to each tied value the average of the ranks they would have otherwise occupied.
Calculate the sum of ranks for each group.
Use the sum of ranks to compute the U statistic for each group.
36.1.5 Interpretation
The smaller U value is used for the test statistic. This value is then compared to a critical value from the Mann-Whitney U distribution table (or calculated using an approximation for large samples). If the calculated U is less than the critical value from the table, or if the p-value is less than the chosen alpha level, the null hypothesis is rejected, indicating a significant difference between the groups.
36.1.6 Example Problem
Consider two groups of patients treated with different methods to reduce symptoms. Group A consists of 6 patients and Group B consists of 6 patients. Their scores are:
Group A: 120, 101, 130, 115, 100, 130
Group B: 85, 90, 110, 115, 120, 125
Hypotheses:
Null Hypothesis (H₀): The median symptom reduction is equal between both treatments.
Alternative Hypothesis (H₁): The median symptom reduction differs between the treatments.
In R, the Mann-Whitney U test is known as the Wilcoxon rank sum test when it’s applied to two independent samples, and it is indeed performed using the wilcox.test function. This naming might cause some confusion, but they are essentially the same test.
Code
R
# Scores for two groupsgroup_a<-c(120, 101, 130, 115, 100, 130)group_b<-c(85, 90, 110, 115, 120, 125)# Perform Mann-Whitney U testmw_test<-wilcox.test(group_a, group_b)# Print the resultsprint(mw_test)
Wilcoxon rank sum test with continuity correction
data: group_a and group_b
W = 24, p-value = 0.376
alternative hypothesis: true location shift is not equal to 0
36.1.9 Mann-Whitney U Test using Python:
Code
Python
from scipy.stats import mannwhitneyu# Scores for two groupsgroup_a = [120, 101, 130, 115, 100, 130]group_b = [85, 90, 110, 115, 120, 125]# Perform Mann-Whitney U testu_statistic, p_value = mannwhitneyu(group_a, group_b, alternative='two-sided')# Print the resultsprint("U Statistic:", u_statistic, "P Value:", p_value)
U Statistic: 24.0 P Value: 0.3759621824832893
This test allows researchers and analysts to assess the evidence against the null hypothesis in a manner that is robust to non-normal data distributions.